Goto

Collaborating Authors

 sketch understanding


Inference Suppose S: X R is a continuous set function w.r.t Hausdorff distance dH(,). ε > 0, foranyfunctionf andanyinvertiblemapP: X Rn, functionhandg,suchthatfor anyX X: |S(X) g(P

Neural Information Processing Systems

Theorem 2. The Instances in the bag are represented by random variables Θ1,Θ2,...,Θn, the information entropy of the bag under the correlation assumption can be expressed as H(Θ1,Θ2,...,Θn), and the information entropy of the bag under the i.i.d. Therefore, it is proved that the information source under the correlation assumption has smaller information entropy. In other words, correlation assumption reduces the uncertainty and brings more useful information. Given a set of bags {X1,X2,...,Xb}, and each bag Xi contains multiple instances {xi,1,xi,2,...,xi,n} and a corresponding label Yi. Obviously, the key to Transformer based MIL is how to design the mapping of X T. However, there are many difficulties to directly apply Transformer in WSI classification, including the large number of instances in each bag and the large variation in the number of instances in different bags (e.g., ranging from hundreds to thousands).


ProHD: Projection-Based Hausdorff Distance Approximation

Fu, Jiuzhou, Guo, Luanzheng, Tallent, Nathan R., Zhao, Dongfang

arXiv.org Artificial Intelligence

The Hausdorff distance (HD) is a robust measure of set dissimilarity, but computing it exactly on large, high-dimensional datasets is prohibitively expensive. We propose \textbf{ProHD}, a projection-guided approximation algorithm that dramatically accelerates HD computation while maintaining high accuracy. ProHD identifies a small subset of candidate "extreme" points by projecting the data onto a few informative directions (such as the centroid axis and top principal components) and computing the HD on this subset. This approach guarantees an underestimate of the true HD with a bounded additive error and typically achieves results within a few percent of the exact value. In extensive experiments on image, physics, and synthetic datasets (up to two million points in $D=256$), ProHD runs 10--100$\times$ faster than exact algorithms while attaining 5--20$\times$ lower error than random sampling-based approximations. Our method enables practical HD calculations in scenarios like large vector databases and streaming data, where quick and reliable set distance estimation is needed.


RadarSFD: Single-Frame Diffusion with Pretrained Priors for Radar Point Clouds

Zhao, Bin, Garg, Nakul

arXiv.org Artificial Intelligence

Millimeter-wave radar provides perception robust to fog, smoke, dust, and low light, making it attractive for size, weight, and power constrained robotic platforms. Current radar imaging methods, however, rely on synthetic aperture or multi-frame aggregation to improve resolution, which is impractical for small aerial, inspection, or wearable systems. We present RadarSFD, a conditional latent diffusion framework that reconstructs dense LiDAR-like point clouds from a single radar frame without motion or SAR. Our approach transfers geometric priors from a pretrained monocular depth estimator into the diffusion backbone, anchors them to radar inputs via channel-wise latent concatenation, and regularizes outputs with a dual-space objective combining latent and pixel-space losses. On the RadarHD benchmark, RadarSFD achieves 35 cm Chamfer Distance and 28 cm Modified Hausdorff Distance, improving over the single-frame RadarHD baseline (56 cm, 45 cm) and remaining competitive with multi-frame methods using 5-41 frames. Qualitative results show recovery of fine walls and narrow gaps, and experiments across new environments confirm strong generalization. Ablation studies highlight the importance of pretrained initialization, radar BEV conditioning, and the dual-space loss. Together, these results establish the first practical single-frame, no-SAR mmWave radar pipeline for dense point cloud perception in compact robotic systems.


Robust 2D lidar-based SLAM in arboreal environments without IMU/GNSS

Nazate-Burgos, Paola, Torres-Torriti, Miguel, Aguilera-Marinovic, Sergio, Arévalo, Tito, Huang, Shoudong, Cheein, Fernando Auat

arXiv.org Artificial Intelligence

Simultaneous localization and mapping (SLAM) approaches for mobile robots remains challenging in forest or arboreal fruit farming environments, where tree canopies obstruct Global Navigation Satellite Systems (GNSS) signals. Unlike indoor settings, these agricultural environments possess additional challenges due to outdoor variables such as foliage motion and illumination variability. This paper proposes a solution based on 2D lidar measurements, which requires less processing and storage, and is more cost-effective, than approaches that employ 3D lidars. Utilizing the modified Hausdorff distance (MHD) metric, the method can solve the scan matching robustly and with high accuracy without needing sophisticated feature extraction. The method's robustness was validated using public datasets and considering various metrics, facilitating meaningful comparisons for future research. Comparative evaluations against state-of-the-art algorithms, particularly A-LOAM, show that the proposed approach achieves lower positional and angular errors while maintaining higher accuracy and resilience in GNSS-denied settings. This work contributes to the advancement of precision agriculture by enabling reliable and autonomous navigation in challenging outdoor environments.


Hashigo: A Next Generation Sketch Interactive System for Japanese Kanji

Taele, Paul, Hammond, Tracy

arXiv.org Artificial Intelligence

Language students can increase their effectiveness in learning written Japanese by mastering the visual structure and written technique of Japanese kanji. Yet, existing kanji handwriting recognition systems do not assess the written technique sufficiently enough to discourage students from developing bad learning habits. In this paper, we describe our work on Hashigo, a kanji sketch interactive system which achieves human instructor - level critique and feedback on both the visual structure and written technique of students' sketched kanji. This type of automated critique and feedback allows students to target and correct specific deficiencies in their sketches that, if left untreated, are detrimental to effective long - term kanji learning.


Hybrid Primal Sketch: Combining Analogy, Qualitative Representations, and Computer Vision for Scene Understanding

Forbus, Kenneth D., Chen, Kezhen, Xu, Wangcheng, Usher, Madeline

arXiv.org Artificial Intelligence

One of the purposes of perception is to bridge between sensors and conceptual understanding. Marr's Primal Sketch combined initial edge-finding with multiple downstream processes to capture aspects of visual perception such as grouping and stereopsis. Given the progress made in multiple areas of AI since then, we have developed a new framework inspired by Marr's work, the Hybrid Primal Sketch, which combines computer vision components into an ensemble to produce sketch-like entities which are then further processed by CogSketch, our model of high-level human vision, to produce both more detailed shape representations and scene representations which can be used for data-efficient learning via analogical generalization. This paper describes our theoretical framework, summarizes several previous experiments, and outlines a new experiment in progress on diagram understanding.


Sampling and Ranking for Digital Ink Generation on a tight computational budget

Afonin, Andrei, Maksai, Andrii, Timofeev, Aleksandr, Musat, Claudiu

arXiv.org Artificial Intelligence

Digital ink (online handwriting) generation has a number of potential applications for creating user-visible content, such as handwriting autocompletion, spelling correction, and beautification. Writing is personal and usually the processing is done on-device. Ink generative models thus need to produce high quality content quickly, in a resource constrained environment. In this work, we study ways to maximize the quality of the output of a trained digital ink generative model, while staying within an inference time budget. We use and compare the effect of multiple sampling and ranking techniques, in the first ablation study of its kind in the digital ink domain. We confirm our findings on multiple datasets - writing in English and Vietnamese, as well as mathematical formulas - using two model types and two common ink data representations. In all combinations, we report a meaningful improvement in the recognizability of the synthetic inks, in some cases more than halving the character error rate metric, and describe a way to select the optimal combination of sampling and ranking techniques for any given computational budget.


Document-Level Multi-Event Extraction with Event Proxy Nodes and Hausdorff Distance Minimization

Wang, Xinyu, Gui, Lin, He, Yulan

arXiv.org Artificial Intelligence

Document-level multi-event extraction aims to extract the structural information from a given document automatically. Most recent approaches usually involve two steps: (1) modeling entity interactions; (2) decoding entity interactions into events. However, such approaches ignore a global view of inter-dependency of multiple events. Moreover, an event is decoded by iteratively merging its related entities as arguments, which might suffer from error propagation and is computationally inefficient. In this paper, we propose an alternative approach for document-level multi-event extraction with event proxy nodes and Hausdorff distance minimization. The event proxy nodes, representing pseudo-events, are able to build connections with other event proxy nodes, essentially capturing global information. The Hausdorff distance makes it possible to compare the similarity between the set of predicted events and the set of ground-truth events. By directly minimizing Hausdorff distance, the model is trained towards the global optimum directly, which improves performance and reduces training time. Experimental results show that our model outperforms previous state-of-the-art method in F1-score on two datasets with only a fraction of training time.


Google adds Digital Ink Recognition API for touch and stylus input to ML Kit

#artificialintelligence

A month after announcing changes to ML Kit, its toolset for developers to infuse apps with AI, Google today launched the Digital Ink Recognition API on Android and iOS to allow developers to create apps where stylus and touch act as inputs. As the name implies, the API -- which is powered by the same technology underpinning Google's Gboard software keyboard, Quick Draw, and AutoDraw -- looks at a user's strokes on the screen and recognizes what they're writing or drawing. Google says that with the new Digital Ink Recognition API, developers can enable users to input text and figures with a finger and stylus or transcribe handwritten notes to make them searchable. Classifiers parse written text into a string of characters; other classifiers describe shapes such as drawings, sketches, and emojis by the class to which they belong (e.g., circle, square, happy face, and so on). The Digital Ink Recognition API performs processing in near-real-time and on-device, according to Google, with support for over 300 languages and more than 25 writing systems including all major Latin languages, Chinese, Japanese, Korean, Arabic, and Cyrillic.


Interactive Cognitive Assessment Tools: A Case Study on Digital Pens for the Clinical Assessment of Dementia

Sonntag, Daniel

arXiv.org Artificial Intelligence

Interactive cognitive assessment tools may be valuable for doctors and therapists to reduce costs and improve quality in healthcare systems. Use cases and scenarios include the assessment of dementia. In this paper, we present our approach to the semi-automatic assessment of dementia. We describe a case study with digital pens for the patients including background, problem description and possible solutions. We conclude with lessons learned when implementing digital tests, and a generalisation for use outside the cognitive impairments field.